Picture for Deqing Wang

Deqing Wang

Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selection

Add code
May 11, 2026
Viaarxiv icon

LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation

Add code
May 11, 2026
Viaarxiv icon

Policy Improvement Reinforcement Learning

Add code
Apr 01, 2026
Viaarxiv icon

Heterogeneous Agent Collaborative Reinforcement Learning

Add code
Mar 03, 2026
Viaarxiv icon

UniFAR: A Unified Facet-Aware Retrieval Framework for Scientific Documents

Add code
Feb 27, 2026
Viaarxiv icon

UniARM: Towards a Unified Autoregressive Reward Model for Multi-Objective Test-Time Alignment

Add code
Feb 10, 2026
Viaarxiv icon

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

Add code
Feb 09, 2026
Viaarxiv icon

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Add code
Feb 09, 2026
Viaarxiv icon

Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards

Add code
Feb 09, 2026
Viaarxiv icon

Real-Time Aligned Reward Model beyond Semantics

Add code
Jan 30, 2026
Viaarxiv icon